Convolutional Rectifier Networks as Generalized Tensor Decompositions

نویسندگان

Nadav Cohen

Amnon Shashua

چکیده

Convolutional rectifier networks, i.e. convolutional neural networks with rectified linear activation and max or average pooling, are the cornerstone of modern deep learning. However, despite their wide use and success, our theoretical understanding of the expressive properties that drive these networks is partial at best. On other hand, we have a much firmer grasp of these issues in the world of arithmetic circuits. Specifically, it is known that convolutional arithmetic circuits posses the property of ”complete depth efficiency”, meaning that besides a negligible set, all functions that can be implemented by a deep network of polynomial size, require exponential size in order to be implemented (or even approximated) by a shallow network. In this paper we describe a construction based on generalized tensor decompositions, that transforms convolutional arithmetic circuits into convolutional rectifier networks. We then use mathematical tools available from the world of arithmetic circuits to prove new results. First, we show that convolutional rectifier networks are universal with max pooling but not with average pooling. Second, and more importantly, we show that depth efficiency is weaker with convolutional rectifier networks than it is with convolutional arithmetic circuits. This leads us to believe that developing effective methods for training convolutional arithmetic circuits, thereby fulfilling their expressive potential, may give rise to a deep learning architecture that is provably superior to convolutional rectifier networks but has so far been overlooked by practitioners.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Convolutional Rectifier Networks as Generalized Tensor Decompositions

In this paper we analyze the expressiveness of networks, i.e.the functions they can realize, through the notion of grid tensors. Recall from sec. 4 that given templates x . . .x ∈ R, the grid tensor of a score function hy : (R) → R realized by some network, is defined to be a tensor of order N and dimension M in each mode, denoted A(hy), and given by eq. 3. In particular, it is a tensor holding...

متن کامل

Analysis and Design of Convolutional Networks via Hierarchical Tensor Decompositions

The driving force behind convolutional networks – the most successful deep learning architecture to date, is their expressive power. Despite its wide acceptance and vast empirical evidence, formal analyses supporting this belief are scarce. The primary notions for formally reasoning about expressiveness are efficiency and inductive bias. Expressive efficiency refers to the ability of a network ...

متن کامل

Tensor-Dictionary Learning with Deep Kruskal-Factor Analysis

A multi-way factor analysis model is introduced for tensor-variate data of any order. Each data item is represented as a (sparse) sum of Kruskal decompositions, a Kruskalfactor analysis (KFA). KFA is nonparametric and can infer both the tensor-rank of each dictionary atom and the number of dictionary atoms. The model is adapted for online learning, which allows dictionary learning on large data...

متن کامل

Expressive power of recurrent neural networks

Deep neural networks are surprisingly efficient at solving practical tasks, but the theory behind this phenomenon is only starting to catch up with the practice. Numerous works show that depth is the key to this efficiency. A certain class of deep convolutional networks – namely those that correspond to the Hierarchical Tucker (HT) tensor decomposition – has been proven to have exponentially hi...

متن کامل